Assessment of EuroSCORE II and STS Score Performance and the Impact of Surgical Urgency in Isolated Coronary Artery Bypass Graft Surgery at a Referral Center in São Paulo, Brazil

Introduction Risk prediction models, such as The Society of Thoracic Surgeons (STS) risk score and the European System for Cardiac Operative Risk Evaluation II (EuroSCORE II), are recommended for assessing operative mortality in coronary artery bypass grafting (CABG). However, their performance is questionable in Brazil. Objective To assess the performance of the STS score and EuroSCORE II in isolated CABG at a Brazilian reference center. Methods Observationaland prospective study including 438 patients undergoing isolated CABG from May 2022-May 2023 at the Instituto Dante Pazzanese de Cardiologia. Observed mortality was compared with predicted mortality (STS score and EuroSCORE II) by discrimination (area under the curve [AUC]) and calibration (observed/expected ratio [O/E]) in the total sample and subgroups of stable coronary artery disease (CAD) and acute coronary syndrome (ACS). Results Observed mortality was 4.3% (n=19) and estimated at 1.21% and 2.74% by STS and EuroSCORE II, respectively. STS (AUC=0.646; 95% confidence interva [CI] 0.760-0.532) and EuroSCORE II (AUC=0.697; 95% CI 0.802-0.593) presented poor discrimination. Calibration was absent for the North American mode (P<0.05) and reasonable for the European model (O/E=1.59, P=0.056). In the subgroups, EuroSCORE II had AUC of 0.616 (95% CI 0.752-0.480) and 0.826 (95% CI 0.991-0.661), while STS had AUC of 0.467 (95% CI 0.622-0.312) and 0.855 (95% CI 1.0-0.706) in ACS and CAD patients, respectively, demonstrating good score performance in stable patients. Conclusion The predictive models did not perform optimally in the total sample, but the EuroSCORE was superior, especially in elective stable patients, where accuracy was satisfactory.


INTRODUCTION
Coronary artery disease (CAD) is the leading cause of death in Brazil [1] , and coronary artery bypass grafting (CABG) is the treatment of choice for many patients with severe CAD, being the most frequent cardiac surgery in this country [2] and worldwide (55% of cardiac surgeries according to data from large centers) [3] .Due to its high prevalence and inherent risks, operative risk assessment is essential.Risk prediction models, developed to estimate morbidity and mortality outcomes in cardiovascular surgeries, are widely used tools that offer significant assistance to health services, public policies, and medical management.In this context, we highlight two risk prediction models with recognized accuracy that are recommended by current international guidelines [4,5] and are intensively used in the medical community: The Society of Thoracic Surgeons (STS) risk score and the European System for Cardiac Operative Risk Evaluation II (EuroSCORE II).The North American STS score model, created in the 1990s and last updated (2018) from a sample of 439,092 surgeries [6] , takes into account 65 complex variables, assessing operative mortality, as well as eight morbidity-related outcomes [7] .Easier to manage (composed of 18 risk factors), the EuroSCORE II, developed in 2011 from 22,381 patients, assesses only operative mortality [8] .Both models have satisfactory performance in predicting mortality in the populations in which they were developed [6][7][8][9] , but studies suggest superiority of the North American model [10,11] .However, they were developed and validated predominantly in a population with different characteristics from the Brazilian reality and other countries.In regard to isolated CABG, the mortality observed by the STS registries in 2019 was 2.2% [12] , while in Brazil, between 2005 and 2007, it was 6.22% [2] , confirming these inequalities.In accordance with this rationale, a study conducted in Turkey in 2013 compared EuroSCORE II, EuroSCORE, and STS in isolated CABG, showing that the first score underestimated mortality, estimated at 1.7% and observed at 7.9% [13] .Also, in a retrospective Brazilian study published in 2020 evaluating risk scores within a sample of 5,222 cardiac surgeries, Mejía OAV et al. [14] observed a mortality (7.6%) much higher than the one estimated by the European (3.1%) and North American (1.0%) models.It is evident, therefore, that the performance of a predictive model differs according to the population groups in which it is applied, making it necessary to assess its accuracy in the Brazilian population, characterized by specific clinical presentations, given the deep socioeconomic and cultural differences, as well as in the distribution and access to health services [15,16] .Thus, this study aims to assess the performance in predicting operative mortality of the two main risk models currently used and recommended, STS score and EuroSCORE II, regarding isolated myocardial revascularization surgeries, in a reference center in Brazil, the Instituto Dante Pazzanese de Cardiologia (IDPC).

METHODS
This is an observational, prospective, single-center study conducted by collecting data from patients undergoing isolated CABG at IDPC.

Data Selection
All of the variables used to calculate the risk of mortality by the risk models analyzed in this study, the STS score (65 variables) and EuroSCORE II (18 variables), were obtained prior to surgery, in order to subsequently estimate the operative risk of patients using the two models above mentioned 17,18] .The outcome assessed was operative mortality, defined as death occurring during the surgical hospitalization (regardless of the length of hospital stay) or occurring within 30 days after the surgical procedure, if discharged from hospital before this period.In addition to these data, educational level and glycated hemoglobin (variables not present in the scores) were collected, for further characterization of the study population.

Variable Definition
The definition of each variable selected and collected is aligned exactly with the recommendations of the EuroSCORE II [8] and STS score [19] predictor models.Creatinine clearance was calculated using the Cockcroft-Gault formula (the same as in EuroSCORE II).Among the data collected, the variables "Canadian Cardiovascular Society IV (CCS IV) angina", "extracardiac arteriopathy", "poor mobility", "recent infarction", and "critical preoperative state" are exclusive to the EuroSCORE II model, while "cerebrovascular disease", "heart failure (HF)" and "immunosuppression" are specific to the STS score.The definition and classification of surgical urgency are similar in both scores and were adopted in this study (elective: performed in routine admissions and can be delayed without causing additional cardiac risk; urgency: patient was not electively admitted for the procedure, which must be performed in the same hospitalization; emergency: must occur before the start of the next working day; salvage: under cardiopulmonary resuscitation).However, "chronic lung disease" and "previous cardiac surgery" are variables present in both scores, but with different definitions.In this study, the specific definition of each variable was adopted for the calculation of the respective score, but for the description of the characteristics of the overall sample, "chronic lung disease" was defined according to EuroSCORE II (necessity of bronchodilator use) and "previous cardiac surgery" according to STS (any cardiac procedure, including percutaneous coronary intervention [PCI]).

Sample Selection/Casuistry
Patients selected for the sample and data collection were all those admitted to the IDPC with scheduled isolated CABG in the same hospitalization, regardless of clinical status, degree of urgency, or surgical indication.Patients undergoing isolated CABG from May 2022 to May 2023 were prospectively included.If there was another associated surgical procedure during the operation, the patient was excluded.

Statistical Analysis
Continuous variables were described by their means and standard deviations.Categorical variables were described using absolute and relative frequencies.The results concerning observed and predicted mortality by the risk models were analyzed to determine the performance of EuroSCORE II and STS score (predictive validation of the models), by calibration (assessed by the observed to predicted mortality ratio, with satisfactory calibration when P>0.05) and discrimination (assessed by the area under the curve [AUC] of the receiver operating characteristic [ROC] curve, being adequate when closer to 1.0 and absent if < 0.5) tests.Calibration of models was also assessed within established risk ranges of predicted mortality (≤ 3%: low risk; from 3 to 6%: moderate risk; ≥ 6%: high risk) and in specific subgroups of stable CAD and acute coronary syndrome (ACS), which included unstable angina and acute myocardial infarction.Results were expressed with 95% confidence intervals (CI).Analyses were conducted using R software, version 4.2.1.

Ethical Considerations
The study complied with all required ethical principles and was approved by the IDPC Research Ethics Committee (Research Ethics Committee number: 5244; report number: 5.383.227;Certificate of Submission for Ethical Assessment: 57362122.7.0000.5462).Patients had access to a Free and Informed Consent Form, in accordance with the general data protection law and resolution 466/2012.

Baseline Characteristics of the Sample and Subgroups
The study included 438 patients who underwent isolated CABG in the one-year period evaluated.The mean age was 62 ± 8.2 years (17.6% were older than or aged 70 years), 26.5% were women, and 76.7% were Caucasian.Table 1 shows the baseline characteristics regarding the clinical background of the sample.Table 2 correlates the baseline characteristics of hospitalization with the indication for CABG (stable CAD or ACS subgroups), showing a higher prevalence of surgical indication for patients with ACS compared with stable CAD (64.2% and 35.8%, respectively) and, therefore, urgent and emergency surgeries were the most frequent surgical statuses.

Outcome and Performance Assessment of Predictor Models
The observed operative mortality was 4.3% (19 patients).All deaths occurred during hospitalization (no outcome occurred after discharge, within 30 days).Estimated operative mortality was 2.74% and 1.21% according to the EuroSCORE II and STS score models, respectively.Tables 3 and 4 demonstrate the calibration of EuroSCORE II and STS, respectively, by analyzing the observed/expected ratio (O/E), which is optimal when closer to 1.0 and positive when the P-value is > 0.05.Calibration was assessed in the specific subgroups of stable CAD and ACS, whose observed mortalities in the study population were 3.2% and 5.0%, respectively.Table 3 reveals that, despite not being ideal (O/E=1.59), the calibration of the European score was positive in the analyzed sample (P>0.05).When assessing the calibration in the subgroups of CAD and ACS patients, it remained positive (with P-values higher than the calibration of the total sample), however, when assessing its performance according to risk ranges, there is a loss of calibration at high risks (> 6%), with a predicted mortality of 11.9% and an observed death of 5.9%.Table 4, in turn, elucidates an absent calibration of the STS score in this sample, with a P-value < 0.001 and an O/E > 3.5 in all subgroups evaluated, including stable CAD and ACS and all established risk ranges, which makes its European competitor superior in this regard.Discrimination was assessed both in the total sample and in the stable CAD and ACS subgroups using the ROC curve, as illustrated in (Figures 1, 2, and 3). Figure 1 demonstrates the area under the ROC curve in the total sample and reveals positive discrimination (AUC > 0.5) for the STS score (AUC=0.646;95% CI 0.760-0.532)and EuroSCORE II (AUC=0.697;95% CI 0.802-0.593);however, due to AUC values < 0.75, poor discrimination is observed.Nevertheless, while discrimination of the models was very limited in the subgroup of patients admitted due to ACS, as shown in Figure 2 (AUC=0.616and AUC=0.467 for EuroSCORE II and STS score, respectively), discrimination in patients with stable CAD was highly positive.In this subgroup of individuals with stable CAD, the AUC was > 0.8 for the European model (AUC=0.826;95% CI 0.991-0.661)and for the North American model (AUC=0.855;95% CI 1.0-0.706),as illustrated in Figure 3.

Analysis of the Observed Operative Mortality Outcome
This is an important prospective study to assess the performance of risk prediction models that are most recommended by current guidelines in a large cardiac surgery referral center in Brazil.The operative mortality in the analyzed center (4.3%), which is part of the Brazilian public health system (Sistema Único de Saúde [SUS]), proved to be lower than that reported by other previous studies in Brazil, whose mortality for CABG is around 6.22% [2] , ranging from 5% to 9.4% according to the center evaluated [20,21] .In fact, there was evidence of a reduction in operative mortality in CABG in the service evaluated, since a previous retrospective analysis (1999-2017) showed a rate of 5% [22] .Even so, the mortality rate is still higher than the reported rates in developed countries, which are around 1.8 to 2.7% [11,23] .As the predictor models, EuroSCORE II and STS, were developed in databases of populations whose mortality is much lower than those of underdeveloped countries, there is a loss of accuracy of the scores when applied to other samples, as shown by previous studies [13] , including in Brazilian centers [14][15][16] .However, in Brazil, the healthcare system is marked by inequality and heterogeneity and, hence, the performance of the two main predictor models at the IDPC was not known.This study, therefore, demonstrated that accuracy was poor for both predictor models (especially for the North American model) in this population as a whole.

Analysis of the Performance of Predictor Models (EuroSCORE II and STS Score)
The EuroSCORE II was found to be superior to the STS score, which is the opposite of what large studies describe [11] and what guidelines recommend [4,5] .While the European model has a Brazilian Journal of Cardiovascular Surgery    positive calibration (O/E=1.59,with P>0.056), the North American score was uncalibrated (O/E=3.6,with P<0.001).Furthermore, the discrimination of the EuroSCORE II (AUC=0.697),although not ideal, is still better than of the STS score (AUC=646).This is probably due to the assessed tendency of the STS score to underestimate operative risk and, in a sample where mortality appears to be slightly higher, estimating a lower risk decreases its accuracy.Other studies have also shown that this model generally provides a lower estimated risk than the European concurrent [10,11,14] .
There are numerous factors that may be related to the increase in operative mortality in the analyzed population with consequent loss of performance of the predictive models.Socioeconomic, cultural, and geographic factors may be associated, but it is not known objectively how these issues have led to increased mortality in this and other studies.It is noted that most of the baseline characteristics of the patients in this study are similar to the variables of other populations found in developed countries, as shown in Table 5, composed by this study population (n=438), EuroSCORE II sample (n=22,381, recruited in 2010), a relevant study conducted in Italy by Paparella D et al. [24] (n=6,293, analyzed with data from the Puglia Adult Cardiac Surgery Registry from 2011 to 2012, which assessed the accuracy of EuroSCORE II in operative mortality), and STS analysis published in 2009 (n=774,881, evaluated from 2002 to 2006, in order to update and validate the STS score in that year) [8,9] .Variables related to age, gender, renal function, peripheral arterial disease, and functional class were quite similar.Interestingly, however, the samples differed greatly in terms of surgical urgency.While in the EuroSCORE II, STS score, and Paparella D et al. [24] survey populations urgent/ emergency surgery was indicated in 22.8%, 19.1%, and 50.3% respectively, our center included 66.4% of patients with urgent surgery.Also, the prevalence of recent infarction was more frequent in our sample, corresponding to 45.9% compared to 16.8% in the study conducted in Italy.In the three foreign analyses (EuroSCORE II, Paparella D et al. [24] , and STS score 2009), the predictive models performed well (discrimination represented by AUC > 0.8), which was not the case in our study.
In this sense, it is realized that the fact that our service performs a large part of urgent/emergency surgeries may be associated with an increment in mortality and discrepancy in the prediction of risk models.This reflects Brazilian socioeconomic inequalities, which do not provide comprehensive primary care, leading to an increase in the prevalence of underdiagnosed and untreated comorbidities, prompting patients to request health services at an advanced stage of the disease, such as urgent cases of ACS.Thus, it is evident that patients arriving with an indication for urgent surgery for an ACS are at higher risk.In the study, 73% of deaths occurred in patients admitted for ACS (an indication for urgent surgery), representing a mortality rate of 5% in this subgroup, in contrast to a mortality rate of 3.2% in patients with stable CAD.
In addition to the pathophysiological mechanisms related to ACS (prothrombotic, inflammatory state, edema, and myocardial stunning), previous decompensated comorbidities diagnosed in an unplanned hospitalization contribute to a worse outcome.This is reflected in a high admission rate of diabetic patients (54% vs. 25% in EuroSCORE II), insulin-dependent diabetics (31.5% vs. 7.6% in EuroSCORE II) [8] , with pulmonary hypertension (32% vs. 18% in a study by Paparella D et al. [24] ), showing a deficient primary care.Also, it can be observed that patients admitted in ACS have a higher prevalence of clinical criteria of severity, conferring higher operative mortality, as identified in Because our service sample differs greatly from the populations that developed the predictor models regarding the prevalence of urgency and ACS, the performance of EuroSCORE II and STS score was evaluated in the subgroup of patients admitted only for stable CAD.In this case, a very good discrimination was observed for both models, reaching AUC > 0.8 (Figure 3), very similar to the values found in the studies in which they were validated [6,8] The calibration of EuroSCORE II also proved to be adequate in this subgroup, with P=0.14.
On the other hand, when discrimination was assessed only in ACS patients (Figure 2), both scores were inadequate (AUC < 0.5).Thus, the high prevalence of ACS and the consequent surgical urgency contributed to increased mortality and inadequate scores performance in our service.Therefore, it is feasible to state that, in view of the need to perform surgical risk in the center evaluated, EuroSCORE II is the most indicated score in our institution, since it has better calibration and discrimination.However, as previously stated, its performance is much better in patients with stable CAD (excellent discrimination and satisfactory calibration) and, in this population, the use of the European model seems to be reliable.

Fig. 1 -
Fig. 1 -Receiver operating characteristic curve for mortality according to the European System for Cardiac Operative Risk Evaluation II (EuroSCORE II) and the Society of Thoracic Surgeons (STS) score for assessment of discrimination capacity in the total sample (n=438).AUC=area under the curve; CI=confidence interval.

Fig. 2 -
Fig. 2 -Receiver operating characteristic curve for mortality according to the European System for Cardiac Operative Risk Evaluation II (EuroSCORE II) and the Society of Thoracic Surgeons (STS) score for assessment of discrimination capacity in the subgroup of patients admitted with acute coronary syndrome (n=281).AUC=area under the curve; CI=confidence interval.

Fig. 3 -
Fig. 3 -Receiver operating characteristic curve for mortality according to the European System for Cardiac Operative Risk Evaluation II (EuroSCORE II) and the Society of Thoracic Surgeons (STS) score for assessment of discrimination capacity in the subgroup of patients admitted with stable coronary artery disease (n=157).AUC=area under the curve; CI=confidence interval.

Table 1 .
Baseline clinical characteristics of the sample (n=438).
BMI=body mass index; BSA=body surface area; CAD=coronary artery disease; DM=diabetes mellitus; IDDM=insulin-dependent diabetes mellitus; PAD=peripheral artery disease; PCI=percutaneous coronary intervention; SAH=systemic arterial hypertension; SD=standard deviation *carotid stenosis, amputation, approach to aneurysm of the abdominal aorta or other arteries †stroke, transient ischaemic attack, carotid stenosis, previous approach carotid stenosis

Table 2 .
Comparison of baseline characteristics (n=438) at admission (clinical, laboratory, echocardiographic, and anatomical) according to indication for coronary artery bypass grafting (stable coronary artery disease and acute coronary syndrome subgroups).

Table 3 .
EuroSCORE II calibration in the total sample, in the subgroups according to the indication for myocardial revascularization (stable coronary artery disease and acute coronary syndrome) and according to the risk ranges.

Table 4 .
Calibration of the Society of Thoracic Surgeons score in the total sample, in subgroups according to the indication for coronary artery bypass grafting (stable coronary artery disease and acute coronary syndrome) and according to risk ranges.
ACS=acute coronary syndrome; CAD=coronary artery disease P-value for test of adherence of observed mortality to predicted mortality

Table 5 .
Comparison of baseline characteristics of this study (n=438) with the sample of EuroSCORE II (n=22,381), Paparella et al. (n=6,293), and the STS published in 2009 (n=774,881).AUC=area under the curve; BMI=body mass index; CABG=coronary artery bypass grafting; CCS=Canadian Cardiovascular Society; EuroSCORE II=European System for Cardiac Operative Risk Evaluation II; IDDM=insulin-dependent diabetes mellitus; IDPC=Instituto Dante Pazzanese de Cardiologia; NYHA=New York Heart Association; PAD=peripheral artery disease; PASP=pulmonary artery systolic pressure; NR=not reported; SD=standard deviation; STS=Society of Thoracic Surgeons Pulmonary hypertension defined when PASP > 30 mmHg Creatinine clearance calculated according to the Cockroft-Gault formula *Sample from the study that developed and validated EuroSCORE II †STS sample published in 2009 to update and validate the STS score that year ‡Paparella et al. sample with data collected from the Puglia Adult Cardiac Surgery Registry §Discrimination of the predictor model by AUC